Treebanks in Machine Translation

نویسندگان

Martin Čmejrek

Jan Cuřín

Jiří Havelka

چکیده

We present an approach using treebanks in machine translation. Our experiment in Czech-English machine translation is an attempt to develop a full machine translation system based on dependency trees (Dependency Based Machine Translation, DBMT). We use the following resources: Prague Dependency Treebank, a newly created Czech-English parallel corpus of Penn Treebank, English monolingual corpus, and translation lexicons. The fully automatic process includes analysis of the Czech input into tectogrammatical (semantic) representation, lexical and structural transfer, a simple rule-based system for generation to English surface realization, and an -gram language model for scoring and choosing from translation hypotheses. The results are evaluated quantitatively with BLEU score.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Generation of Parallel Treebanks

The need for syntactically annotated data for use in natural language processing has increased dramatically in recent years. This is true especially for parallel treebanks, of which very few exist. The ones that exist are mainly hand-crafted and too small for reliable use in data-oriented applications. In this paper we introduce a novel platform for fast and robust automatic generation of paral...

متن کامل

Large aligned treebanks for syntax-based machine translation

We present a collection of parallel treebanks that have been automatically aligned on both the terminal and the nonterminal constituent level for use in syntax-based machine translation. We describe how they were constructed and applied to a syntaxand example-based machine translation system called Parse and Corpus-Based Machine Translation (PaCo-MT). For the language pair Dutch to English, we ...

متن کامل

Resourcing Machine Translation with Parallel Treebanks

vii Acknowledgements viii

متن کامل

Morphologically and Syntactically Annotated Corpora of Many Languages

Annotated corpora have become a standard resource for research in both linguistics and computational processing of natural languages. Lexicographers judge word usage and distribution by occurrences in corpora; part-of-speech tags may help them narrow their queries. Grammarians may use syntactically annotated corpora (treebanks) for queries such as “show me all examples where a verb governs two ...

متن کامل

Unsupervised Generation of Parallel Treebanks through Sub-Tree Alignment

e need for syntactically annotated data for use in natural language processing has increased dramatically in recent years. is is true especially for parallel treebanks, of which very few exist. e ones that exist are mainly hand-craed and too small for reliable use in data-oriented applications. In this paper we introduce an open-source system for fast and robust automatic generation of para...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2003

Treebanks in Machine Translation

نویسندگان

چکیده

منابع مشابه

Automatic Generation of Parallel Treebanks

Large aligned treebanks for syntax-based machine translation

Resourcing Machine Translation with Parallel Treebanks

Morphologically and Syntactically Annotated Corpora of Many Languages

Unsupervised Generation of Parallel Treebanks through Sub-Tree Alignment

عنوان ژورنال:

اشتراک گذاری